An investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means
نویسندگان
چکیده
The scaling parameter α helps maintain a balance between supervised and unsupervised learning in semi-supervised Fuzzy c-Means (ssFCM). In this study, we investigated the effects of different α values, 0.1, 0.5, 1 and 10 in Pedrycz and Waletsky’s ssFCM with various amounts of labelled data, 10%, 20%, 30%, 40%, 50% and 60% and three distance metrics, Euclidean, Mahalanobis and kernel-based on the Nottingham Tenovus Breast Cancer dataset and five popular UCI datasets. Higher α values were found to produced better accuracy using Euclidean distance on four datasets out of the six datasets. For Mahalanobis distance, increasing α to improve accuracy is effective up to α = 1 and not at α = 10 in three out of six dataseets. For kernel-based distance, accuracy tend to decrease with increasing α value, which has been observed in four out of six datasets. Such trends in the effects of α values on the classification results using different distance metrics and datasets can be established to form a guide in the selection of α. Care should be taken in selection of α value as they are dependant on the distance metric, particularly the Mahalanobis and kernelbased distance metrics, and the dataset used.
منابع مشابه
Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification
In previous work, semi-supervised Fuzzy c-means (ssFCM) was used as an automatic classification technique to classify the Nottingham Tenovus Breast Cancer (NTBC) dataset as no method to do this currently exists. However, the results were poor when compared with semi-manual classification. It is known that the NTBC data is highly non-normal and it was suspected that this affected the poor result...
متن کاملAn exploration of improvements to semi-supervised fuzzy c-means clustering for real-world biomedical data
This thesis explores various detailed improvements to semi-supervised learning (using labelled data to guide clustering or classification of unlabelled data) with fuzzy c-means clustering (a ‘soft’ clustering technique which allows data patterns to be assigned to multiple clusters using membership values), with the primary aim of creating a semi-supervised fuzzy clustering algorithm that shows ...
متن کاملA methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-means
Previously, a semi-manual method was used to identify six novel and clinically useful classes in the Nottingham Tenovus Breast Cancer dataset. 663 out of 1076 patients were classified. The objectives of our work is three folds. Firstly, our primary objective is to use one single automatic method (post-initialisation) to reproduce the six classes for the 663 patients and to classify the remainin...
متن کاملEnhancement of fuzzy clustering by mechanisms of partial supervision
Semi-supervised (or partial) fuzzy clustering plays an important and unique role in discovering hidden structure in data realized in presence of a certain quite limited fraction of labeled patterns. The objective of this study is to investigate and quantify the effect of various distance functions (distances) on the performance of the clustering mechanisms. The underlying goal of endowing the c...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کامل